AITopics | compounding error

Collaborating Authors

compounding error

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Distributional Successor Features Enable Zero-Shot Policy Optimization

Neural Information Processing SystemsMar-22-2026, 17:34:47 GMT

Intelligent agents must be generalists, capable of quickly adapting to various tasks. In reinforcement learning (RL), model-based RL learns a dynamics model of the world, in principle enabling transfer to arbitrary reward functions through planning. However, autoregressive model rollouts suffer from compounding error, making model-based RL ineffective for long-horizon problems. Successor features offer an alternative by modeling a policy's long-term state occupancy, reducing policy evaluation under new rewards to linear regression. Yet, policy optimization with successor features can be challenging. This work proposes a novel class of models, i.e., Distributional Successor Features for Zero-Shot Policy Optimization (DiSPOs), that learn a distribution of successor features of a stationary dataset's behavior policy, along with a policy that acts to realize different successor features within the dataset. By directly modeling long-term outcomes in the dataset, DiSPOs avoid compounding error while enabling a simple scheme for zero-shot policy optimization across reward functions. We present a practical instantiation of DiSPOs using diffusion models and show their efficacy as a new class of transferable models, both theoretically and empirically across various simulated robotics problems.

large language model, machine learning, natural language, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

Distributional Successor Features Enable Zero-Shot Policy Optimization

Neural Information Processing SystemsFeb-18-2026, 10:42:50 GMT

Intelligent agents must be generalists, capable of quickly adapting to various tasks.

large language model, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Austria (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(9 more...)

Genre:

Research Report > Experimental Study (1.00)
Workflow (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(5 more...)

Add feedback

Time-series Generation by Contrastive Imitation

Neural Information Processing SystemsDec-25-2025, 05:52:15 GMT

Consider learning a generative model for time-series data. The sequential setting poses a unique challenge: Not only should the generator capture the dynamics of (stepwise) transitions, but its open-loop rollouts should also preserve the distribution of (multi-step) trajectories. On one hand, autoregressive models trained by MLE allow learning and computing explicit transition distributions, but suffer from compounding error during rollouts. On the other hand, adversarial models based on GAN training alleviate such exposure bias, but transitions are implicit and hard to assess. In this work, we study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking), where the reinforcement signal is provided by a global (but stepwise-decomposable) trained by contrastive estimation.

contrastive imitation, name change, time-series generation, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Learning World Models for Interactive Video Generation

Chen, Taiye, Hu, Xun, Ding, Zihan, Jin, Chi

arXiv.org Artificial IntelligenceOct-31-2025

Foundational world models must be both interactive and preserve spatiotemporal coherence for effective future planning with action choices. However, present models for long video generation have limited inherent world modeling capabilities due to two main challenges: compounding errors and insufficient memory mechanisms. We enhance image-to-video models with interactive capabilities through additional action conditioning and autoregressive framework, and reveal that compounding error is inherently irreducible in autoregressive video generation, while insufficient memory mechanism leads to incoherence of world models. We propose video retrieval augmented generation (VRAG) with explicit global state conditioning, which significantly reduces long-term compounding errors and increases spatiotemporal consistency of world models. In contrast, naive autoregressive generation with extended context windows and retrieval-augmented generation prove less effective for video generation, primarily due to the limited in-context learning capabilities of current video models. Our work illuminates the fundamental challenges in video world models and establishes a comprehensive benchmark for improving video generation models with internal world modeling capabilities.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.21996

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Distributional Successor Features Enable Zero-Shot Policy Optimization

Neural Information Processing SystemsOct-10-2025, 19:19:56 GMT

Intelligent agents must be generalists, capable of quickly adapting to various tasks.

dataset, dispo, international conference, (13 more...)

Neural Information Processing Systems

Country:

Europe > Austria (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(9 more...)

Genre:

Research Report > Experimental Study (1.00)
Workflow (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(5 more...)

Add feedback

b048dd19ba6d85b9066aa93b4de9ad4a-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 04:58:14 GMT

objective, random feature, reward function, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > Canada > British Columbia > Vancouver (0.04)
(12 more...)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

b048dd19ba6d85b9066aa93b4de9ad4a-Paper-Conference.pdf

Neural Information Processing SystemsSep-28-2025, 19:49:08 GMT

objective, random feature, reward function, (12 more...)

Neural Information Processing Systems

Country:

Europe > Sweden (0.14)
Africa > Ethiopia (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
(7 more...)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

From Imitation to Optimization: A Comparative Study of Offline Learning for Autonomous Driving

Guillen-Perez, Antonio

arXiv.org Artificial IntelligenceAug-28-2025

Learning robust driving policies from large-scale, real-world datasets is a central challenge in autonomous driving, as online data collection is often unsafe and impractical. While Behavioral Cloning (BC) offers a straightforward approach to imitation learning, policies trained with BC are notoriously brittle and suffer from compounding errors in closed-loop execution. This work presents a comprehensive pipeline and a comparative study to address this limitation. We first develop a series of increasingly sophisticated BC baselines, culminating in a Transformer-based model that operates on a structured, entity-centric state representation. While this model achieves low imitation loss, we show that it still fails in long-horizon simulations. We then demonstrate that by applying a state-of-the-art Offline Reinforcement Learning algorithm, Conservative Q-Learning (CQL), to the same data and architecture, we can learn a significantly more robust policy. Using a carefully engineered reward function, the CQL agent learns a conservative value function that enables it to recover from minor errors and avoid out-of-distribution states. In a large-scale evaluation on 1,000 unseen scenarios from the Waymo Open Motion Dataset, our final CQL agent achieves a 3.2x higher success rate and a 7.4x lower collision rate than the strongest BC baseline, proving that an offline RL approach is critical for learning robust, long-horizon driving policies from static expert data.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2508.07029

Country: Europe (0.68)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Imitation Learning in Continuous Action Spaces: Mitigating Compounding Error without Interaction

Zhang, Thomas T., Pfrommer, Daniel, Matni, Nikolai, Simchowitz, Max

arXiv.org Machine LearningJul-29-2025

We study the problem of imitating an expert demonstrator in a continuous state-and-action dynamical system. While imitation learning in discrete settings such as autoregressive language modeling has seen immense success and popularity in recent years, imitation in physical settings such as autonomous driving and robot learning has proven comparably more complex due to the compounding errors problem, often requiring elaborate set-ups to perform stably. Recent work has demonstrated that even in benign settings, exponential compounding errors are unavoidable when learning solely from expert-controlled trajectories, suggesting the need for more advanced policy parameterizations or data augmentation. To this end, we present minimal interventions that provably mitigate compounding errors in continuous state-and-action imitation learning. When the system is open-loop stable, we prescribe "action chunking," i.e., predicting and playing sequences of actions in open-loop; when the system is possibly unstable, we prescribe "noise injection," i.e., adding noise during expert demonstrations. These interventions align with popular choices in modern robot learning, though the benefits we derive are distinct from the effects they were designed to target. Our results draw insights and tools from both control theory and reinforcement learning; however, our analysis reveals novel considerations that do not naturally arise when either literature is considered in isolation.

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Machine Learning

2507.09061

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Ground > Road (0.65)
Automobiles & Trucks (0.47)
Energy (0.46)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Directly Forecasting Belief for Reinforcement Learning with Delays

Wu, Qingyuan, Wang, Yuhui, Zhan, Simon Sinong, Wang, Yixuan, Lin, Chung-Wei, Lv, Chen, Zhu, Qi, Schmidhuber, Jürgen, Huang, Chao

arXiv.org Artificial IntelligenceJun-10-2025

Reinforcement learning (RL) with delays is challenging as sensory perceptions lag behind the actual events: the RL agent needs to estimate the real state of its environment based on past observations. State-of-the-art (SOTA) methods typically employ recursive, step-by-step forecasting of states. This can cause the accumulation of compounding errors. To tackle this problem, our novel belief estimation method, named Directly Forecasting Belief Transformer (DFBT), directly forecasts states from observations without incrementally estimating intermediate states step-by-step. We theoretically demonstrate that DFBT greatly reduces compounding errors of existing recursively forecasting methods, yielding stronger performance guarantees. In experiments with D4RL offline datasets, DFBT reduces compounding errors with remarkable prediction accuracy. DFBT's capability to forecast state sequences also facilitates multi-step bootstrapping, thus greatly improving learning efficiency. On the MuJoCo benchmark, our DFBT-based method substantially outperforms SOTA baselines. Code is available at https://github.com/QingyuanWuNothing/DFBT.

forecasting belief, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2505.00546

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback